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ABSTRACT 



An enhancement to computer network maintenance tech- 
nology which reduces redundant and inaccurate fault report- 
ing and alerting based upon implementation of logic which 
determines the most likely single point of failure. In modern 
computer and telephone networks, certain single points of 
failure result in the false appearance of multiple failures. 
However, by analyzing the pattern of apparent failures in 
view of the known network topology, a single point of 
failure can be determined as the root cause of the multiple 
failure indications. An enhancement to the currently- 
available network maintenance technology, including soft- 
ware applications executing on network server platforms, 
provides this fault determination logic, filters spurious and 
incorrect failure reports, and posts failure reports only for 
the single point failure. 

24 Claims, 6 Drawing Sheets 
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NETWORK FAULT ALERTING SYSTEM 
AND METHOD 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

This invention pertains to the arts of computer network 
management, and especially to the management of network 
bandwidth consumed by network management, status, and 
maintenance messages. More particularly, this invention 
relates to the arts of intelligent processing and diagnosis of 
network failures and problems based on faiilt analysis logic 
to more accurately detect and isolate computer network 
problems, to minimize the network bandwidth consumed by 
maintenance messages, and to effectively notify mainte- 
nance personnel of the most likely point of failure. 

2. Description of the Related Art 

Computer networks, such as local area networks 
("LAN''), wide -area networks ("WAN"), intranets and the 
Internet typically include substantial maintenance and moni- 
toring capabilities. Modern telephone networks, such as 
Signalhng System 7 ("SS7), Integrated Services Data Net- 
work ("ISDN"), and many digital cellular networks includ- 
ing GSM, also include substantial equipment and software 
which are dedicated to the provisioning, monitoring and 
maintenance of the network and its equipment. All of the 
above named networks are packet-based networks, and are 
well-known within their respective arts. 

Key to the architecture and operation of these networks 
are packet routers, which interconnect multiple physical 
networks and provide routing and forwarding of packets, or 
"messages", from one network to another based upon 
addressing schemes defined by well-known protocols such 
as the Internet Protocol ("IP") or LAPD for SS7 and ISDN. 
These addressing schemes can be generalized as schemes 
which define each data packet or message has having a 
header, payload, and tail. The destination address, origina- 
tion address, packet sequence number, and payload size are 
typically included in the header section of the message. The 
payload section contains the actual computer data which is 
being transferred from one computer to another via the 
computer network, which may represent a portion of a 
computer file, a formatted message, or a section of digitized 
signal such as voice, video or other audio. The various 
message formats are defined by well-known standards pro- 
mulgated by InterNIC, the International Telecommunica- 
tions Union, Bellcore, and ANSI. 

In order to manage these networks, including monitoring 
of network operation status, configuring and re-configuring 
network elements (routers, terminals and switches), and 
provisioning of new network sections, a number of well- 
known software and hardware products have been devel- 
oped and placed on the market. Most of these products 
integrate specialized software onto network server plat- 
forms. The software uses the network connectivity and 
bandwidth provided by the network server platform to 
perform maintenance testing, messaging, status checking, 
and alert messaging. Many times, the actual network being 
used for "real" traffic, such as computer file transmission or 
telephone call transmission, is used for the maintenance 
communications as well. In this case, the maintenance 
messages "mix in" with the bandwidth of the "real" traffic. 
As such, if maintenance messages accumxilate to significant 
bandwidth consumption, network performance may be 
adversely affected. In other cases, separate networks dedi- 
cated to maintenance may be configured to avoid this 
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problem. But, even so, if maintenance messages exceed an 
expected bandwidth level, the dedicated maintenance net- 
work may fail. 

When network management software like Netview/6000 
5 or Hewlett-Packard's Open View and others, detects a net- 
work device such as a router has gone off-line, it will send 
"node down" events or messages for all the workstations 
connected downstream from off-line router to network prob- 
lem management server. The network problem management 
10 server provides correlation and processing for opening 
trouble tickets, and eventually, it send alerts to appropriate 
maintenance personnel thru pagers, e-mail, and/or telephone 
calls. 

FIG. 1 shows the topology of prior art maintenance 
systems. A router (1) may have multiple ports to multiple 
networks. Each port is serviced by a network interface card 
("NIC), such as an Ethernet LAN interface card. FIG. 1 
shows an example of a router serving three networks. A, B, 
and C, each of which is a group of networked computer 
^ workstations or personal computers. For example, network 
A (5) has several "drops" to computers, and one drop or 
connection (6) to the router. Likewise, network B (4) is 
connected (3) to the router, and network C (2) is connected 
(7) to the router. Packets or messages received by the router 
are forwarded to other networks based on the addressing 
scheme of the network, such as IP in the case of many 
computer networks. 

Also shown in FIG. 1 is a connection (8) to a maintenance 
server (9) such as a NetView 6000 server. In this example, 
this connection (8) connects to the router (1) using the 
router's NIC for network D. The maintenance server (9) 
typically contains a connectivity database which contains all 
of the network addresses of all the elements on the other 

2^ networks connected to the router, such as all the computers 
connected to networks A, B, and C. Using this database, the 
maintenance server (8) periodically sends status query 
messages, or "pings", to each of the computers. If each 
computer is on-line, the router is functioning properly, and 
the network physical media (cable, RF links, etc.) is in tact, 
a reply will be received from each computer nearly imme- 
diately in response to the "ping". If a reply or response is not 
received within a certain time firom transmitting of the 
"ping", the maintenance server (9) may assiune a problem 
with the computer, router, or network(s) exists. 

For example, if all computers and the router are function- 
ing correctly except for one computer, then only one 
response will not be received, and all other responses will be 
received. However, if the router fails, no responses wiU be 

50 received from any of the computers. In the most basic of 
maintenance system configurations such as the basic Net- 
View 6000 product, this scenario can result in a storm of 
events being sent to the problem management server which 
correlates events and opens trouble tickets, leading to many 

55 useless and/or redundant e-mails and pagers. 

FIG. 2 illustrates this scenario. A normal "ping" (20) is 
forwarded firom the NetView 6000 to the router, which 
forwards (21) it to the appropriate PC. The PC, if function- 
ing properly, repUes (22) via the router to the NetView 6000 

60 (23) within a predetermined time limit tj. If the router has 
failed, the "ping" (24) wiU not be replied to by any of the 
computers within time tj, which will result in the NetView 
6000 sending multiple "computer down" messages (25) to 
the problem management server. The problem management 

65 server is configured to wait a period of time tg before 
escalating the event to notification of the maintenance 
personnel, in order to reduce the number of alerts made for 
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temporary problems such as power glitches, computer 
reboots, etc. But, if no "computer up"' messages are received 
within time limit tj, the problem management server will 
send multiple pager messages and telephone calls, and may 
open multiple trouble tickets (26), as many as one per 5 
computer on the network. This results the in the alerting of 
the maintenance personnel, but is confusing to the personnel 
as to which element is actually failed, Additionally, the 
network link between the NetView 6000 server and the 
problem management server has suffered unnecessary band- lO 
width consumption by all of the "computer down" mes- 
sages. 

In an enhancement of the prior art network management 
technology, a product called Tivoli for Network Connectiv- 
ity module (TFNC) by International Business Machines 15 
("IBM") employs similar concept, but it adds some intelli- 
gent processing to the maintenance server. With TFNC, all 
of the original "computer down" messages will be sent to the 
problem management server, but, as shown in FIG. 3, the 
Tivoli processing (30) will examine the network topology 
and determine that all of these failures are likely due to a 
single point failure, namely a router failure. So, within the 
escalation time period tg, TFNC will send multiple "com- 
puter up" messages (31) to the problem management server, 
which results in a net status of only the "roiiter down" ^5 
message being escalated by the problem management server. 
While this enhancement to the network maintenance tech- 
nology produces a desirable reduction in the number of 
alerts (pager messages, trouble tickets, etc.) (32) issued to 
maintenance personnel, it does not reduce the bandwidth 
consumed by the messages on the network between the 
maintenance server (TFNC and NetView 6000). Rather, it 
nearly doubles the bandwidth consumption. 

Therefore, there is a need in the art for a system and 
method which intelligently processing the "ping" response 
pattern in a timely manner, and which issues a minimal 
number of "network element down" messages which pre- 
cisely isolate the most likely point of failure in order to 
minimize network bandwidth consumption, and to minimize 
redundant and incorrect maintenance alerts. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The following detailed description when taken in con- 
junction with the figures presented herein present a complete 45 
description of the present invention. 

FIG. 1 shows the prior art topology for network manage- 
ment servers, software, and connectivity. 

FIG. 2 discloses the message sequence used in prior art 
network management technology. so 

FIG. 3 discloses the enhanced prior art network manage- 
ment technology message sequence. 

FIG. 4 illustrates the functional flow of the inventive 
method which filters and diagnosis the most likely point of 
failure in the network. 

FIG. 5 shows the modified network topology to include a 
system which implements the inventive method. 

FIG. 6 shows the message sequence achieved by iise of 
the inventive method, with substantially reduced network so 
bandwidth requirements and increased accuracy of the 
alerts. 

SUMMARY OF THE INVENTION 

The foregoing and other objects, features and advantages 65 
of the invention will be apparent from the following more 
particular description of a preferred embodiment of the 
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invention, as illustrated in the accompanying drawings 
wherein like reference numbers represent like parts of the 
invention. 

The inventive method is preferably implemented as a 
software application which will integrate with the existing 
network management software packages and servers, such 
as Netview/6000, Hewlett-Packard OpenMew, and others. 
The new software appHcation implements the following 
general method or logic: 

(a) When a router or a network device goes off-line, then 
it will send only one "network element or router down" 
event to the problem management server which does 
the correlation and issues the trouble tickets for alert- 
ing. Thus when the router down (network device) event 
is sent via a pager or email, the network operations 
personnel will know the router is down, and it is 
obvious that the devices connected downstream will be 
offline from the entire network; 

(b) When a router NIC, port or interface goes off-line, the 
same logic should result in only one router down 
message being sent to the problem management server; 
and 

(c) When a networked element other than a router or NIC, 
such as a computer, goes off-line, it will send only one 
"computer down" event to the problem management 
server. 

DETAILED DESCRIPTION OF THE 
INVENTION 

The inventive method is preferably realized as a software 
application, called "Valerie", which integrates with existing 
network managment software packages and servers, such as 
Netview/6000, Hewlett-Packard OpenView, on common 
network server computer hardware platforms such as an 
IBM RS/6000. 

By the logic of the method, it is assumed that it is not 
likely that multiple failures occur on the same network 
simultaneously. An even if multiple failures are detected or 
indicated, certain patterns to the indications allow for diag- 
nosis of a more likely single point of failure. For example, 
if all but one of the computers on network A in FIG. 1 are 
responding to "pings", it is more likely that the non- 
responsive computer is the failure point as the network 
wiring, router NIC, and router are still functioning for the 
other computers on network A. In fact, if even one computer 
on the network responds, it can be assumed that the network 
wiring, NMC and router are functioning correctly. However, 
if the pattern of non-responses includes all of the computers 
on a network, then the NIC and the router are suspect. 

So, in the second step of the logic, if any computers on 
any other network connected to the router are responding, 
but all of the computers on just one network are not 
responding, it can be assumed to be a network wiring or NIC 
problem with the non-responding network. But, if no com- 
puters on any networks are responding, then the router can 
be assumed to be the single -point of failure. 

In order to process the non-responses and the responses in 
this logical fashion, the Valerie appHcation must have access 
to the connectivity database which describes the topology of 
the networks and computers interconnected by the router, 
and contains the addresses of the computers and other 
network elements. This database is already available from or 
through the network maintenance server, typically through a 
application program interface ("APP"). In the prefered 
embodiment, Valerie is a software application written in 
"C\ and compiled and targeted for an RS/6000 computer 
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platform running xinder the AIX operating system concur- 
rently with NetView/6000. However, other languages, such 
as Java or C++, platforms, such as a Sun Server or IBM- 
compatible personal computer, and operating systems, such 
as Solaris or Microsoft Windows NT, may be used as the 5 
target system. In any case, the Valerie application program 
can access the connectivity database via an API through the 
NetView or OpenView application. Valerie can also send 
and receive messages using the platform's communication 
protocol stack, such as IP, and network interface cards, such 10 
as Ethernet interfaces, as well as monitor for messages on 
the network. The integration of Valerie into the overall 
network management technology is completed by disabHng 
the "element down/element up" message output capability 
of the NetView or OpenView software, and by enabling the 15 
output of the Valerie logic results. Valerie's logic can be 
triggered by the results of the monitoring activity, or more 
actively by "trapping" the output event from the NetView or 
OpenView software. 

FIG. 4 summarizes the Jogic of Valerie in a functional 20 
flow depiction. When Valerie is started (41), it reads the 
connectivity database and develops rules based on the 
network connectivity related to the router. Then, it periodi- 
cally sends "ping" messages (43) to each element connected 
to the router. Alternatively, it may simply monitor the ^5 
network for "pings" from the NetView application to each 
network element. These "pings" can be sent at any interval 
rate, but are sent at approximately 5 minute intervals in the 
preferred embodiment. Until a response is not received 
within a determined time limit, such as 5 minutes, the period 
"pings" continue. But when one or more responses are not 
received within the time limit, the logic processing begins. 
First, a recent history log is examined (44) to determine if 
any other computers on the same network or router NIC 
have been received. If so, then a single "element down" 35 
message for the non-responding element or computer is sent 
(45) to the problem management server. 

If no other responses have been received recently from 
other elements or computers on the same network, then the 
history log is examined (46) to see if any other computers or 
elements on any other networks connected to the router have 
been received. If so, then the router NIC and/or network 
cabling for the the non-responsive network is assumed to be 
the point of failure, and a single "NIC or network down" 
message is sent (47) to the problem management server. 

However, if no other elements or computers on any of the 
networks connected to the router have responded recently, 
then a single "router down" message is sent (48) to the 
problem management server. 

In this embodiment of the invention, the history log can 
be built and updated by Valerie actively transmitting "pings" 
to network elements and registering the received responses. 
Or, it can be built passively by Valerie monitoring (or 
"snooping") the network for "pings" and responses between 
network elements and the network management software 
application (NetView/6000 or OpenView). 

In an alternate embodiment of the invention, the history 
log is updated by quickly issuing new "pings" to all other 
network elements when a single response is not received go 
within the time limit. This allows the fault deduction logic 
to operate on more recent data, giving a more accurate result. 

Finally, turning to FIG. 6, the reduced message bandwidth 
realized by the invention is noticable. Following the Valeria 
processing (62), a single "element down" message is sent to 65 
the problem management server by the enhanced mainte- 
nance server, shown here as NetView/6000 with Valerie. 
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It will be understood from the foregoing description that 
various modifications and changes may be made in the 
preferred embodiment of the present invention without 
departing from its true spirit, such as the xise of alternate 
programming methodologies or languages, alternate server 
platforms, various networking protocols, operating systems 
and development tool sets. It is intended that this description 
is for purposes of illustration only and should not be 
construed in a Limiting sense. The scope of this invention 
should be limited only by the language of the following 
claims. 

What is claimed is: 

1. A method of producing failure alerts in a computer 
network containing a plurality of networked elements 
including at least one network router, at least one network 
management server, and at least one problem management 
server, said router being interconnected to several 
subnetworks, each subnetwork interconnecting several net- 
worked elements, said method comprising the steps of: 

monitoring transmissions via a computer network at least 
one status query message to each of said networked 
elements in said computer network; 

initiating a timer for awaiting receipt of vaHd status 
responses from each networked element in reply to 
each status query message; 

performing a fault tree analysis to determine the most 
likely single point of failure based upon a rule structure 
related to the topology of the computer network, said 
performance of fault tree analysis being invoked by 
expiration of the timer if less than all status responses 
are received; 

transmitting via a computer network to said problem 
management server at least one element failed message 
for said determined single point of failure such that said 
problem management server is notified of the most 
Hkely point of failure; 

receiving via a computer network one or more network 
element failed messages transmitted from said network 
management server; 

selecting one network element failed message based upon 
results of said fault tree analysis; and 

forwarding said selected network element failed message 
to said problem management server via a computer 
network, thereby, blocking the forwarding of all other 
network element failed messages received from the 
network management server from being received by 
said problem management server. 

2. A method of producing failure alerts in a computer 
network as set- forth in claim 1 further comprising the steps 
of: 

accessing a computer-readable media disposed in said 
network management server to obtain computer net- 
work connectivity and topology data; and 

initiating said rule stnictiue based upon said accessed 
computer network connectivity and topological data. 

3. A method of producing failure alerts in a computer 
network as set forth in claim 2, wherein the step of per- 
forming fault tree analysis further comprises the step of 
determining that a single element on a subnetwork is failed 
only if no response has been received from that single 
element and other responses have been received from other 
networked element on the same subnetwork within a pre- 
determined amount of time. 

4. A method of producing failure alerts in a computer 
network as set forth in claim 2, wherein the step of per- 
forming fault tree analysis further comprises the step of 
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determining that a router interface, network interface card or 
port is failed only if no responses have been received from 
any of the netwodced elements on the subnetwork associated 
with that router interface, network interface card or port, and 
only if other responses have been received from other 
networked elements on other subnetworks associated with 
other router interfaces, network interface cards, and ports on 
the same router within a predetermined amount of time. 

5. A method of producing failure alerts in a computer 
network as set forth in claim 2, wherein the step of per- 
forming fault tree analysis further comprises the step of 
determining that a router is failed only if no responses have 
been received from any networked elements oq any subnet- 
works associated with any of the router's interfaces, network 
interface cards, and ports within a predetermined amount of 
time. 

6. A method of producing failure alerts in a computer 
network as set forth in claim 1, further comprising the 
following steps after expiration of the timer and prior to 
performance of the fault tree analysis: 

immediately retransmitting all status query messages to 
all networked elements upon the expiration of the 
timer; and 

re-initiating a timer for awaiting receipt of valid status 
responses from each networked element in reply to 
each retransmitted status query message, such that said 
step of performing fault tree analysis may be performed 
using a set of recently received responses from the 
networked elements. 

7. A method of producing failure alerts in a computer 
network as set forth in claim 6, wherein said re-initiated 
timer is set for an expedited expiration, its expiration value 
being significantly shorter than the value of its normally 
initiated value. 

8. A computer program product for use with network 
management server in a computer network, said computer 
network coQtaining a plurality of networked elements 
including at least one network router, at least one network 
management server, and at least one problem management 
server, said router being interconnected to several 
subnetworks, each subnetwork interconnecting several net- 
worked elements, said computer program product compris- 
ing: 

a computer usable medium having computer readable 
program code means embodied in said medium for 
monitoring transmissions via a computer network at 
least one status query message to each of said net- 
worked elements in said computer network; 

a computer usable medium having computer readable 
program code means embodied in said medium for 
initiating a timer for awaiting receipt of valid status 
responses from each networked element in reply to 
each status query message; 

a computer usable medium having computer readable 
program code means embodied in said medium for 
performing a fault tree analysis to determine the most 
likely single point of failure based upon a rule structure 
related to the topology of the computer network, said 
performance of adult tree analysis being invoked by 
expiration of the timer if less than all status responses 
are received 

a computer usable mediiun having computer readable 
program code means embodied in said medium for 
transmitting via a computer network to said problem 
management server at least one element failed message 
for said determined single point of failure such that said 



13,634 Bl 

8 

problem management server is notified of the most 
likely point of failure; 
a computer usable medium having computer readable 
program code means embodied in said medium for 
5 receiving via a computer network one or more network 
element failed messages transmitted from said network 
management server; 
a commuter usable medium having computer readable 
program code means embodied in said medium for 
JO selecting one network element failed message based 
upon results of said fault tree analysis; and 
a computer usable medium having computer readable 
program code means embodied in said medium for 
forwarding said selected network element failed mes- 
15 sage to said problem management server , via a com- 
puter network, thereby blocking the forwarding of all 
other network element failed messages received from 
the network management server from being received by 
said problem management server. 
20 9. A computer program product for use with network 
management server in a computer network as set forth in 
claim 8 further comprising: 

a computer usable medium having computer readable 
program code means embodied in said medium for 
25 accessing a computer-readable media disposed in said 
network management server to obtain computer net- 
work connectivity and topology data; and 
a computer usable medium having computer readable 
program code means embodied in said medium for 
30 initiating said rule structure based upon said accessed 
computer network connectivity and topological data. 

10. A computer program product for use with network 
management server in a computer network as set forth in 
claim 8 wherein the computer readable code for performing 

35 fault tree analysis further comprises computer readable 
program code means embodied in said medium for deter- 
mining that a single element on a subnetwork is failed only 
if no response has been received from that single element 
and other responses have been received from other net- 

40 worked element on the same subnetwork within a predeter- 
mined amount of time. 

11. A computer program product for use with network 
management server in a computer network as set forth in 
claim 8 wherein the computer readable code for performing 

45 fault tree analysis further comprises computer readable 
program code means embodied in said medium for deter- 
mining that a router interface, network interface card or port 
is failed only if no responses have been received from any 
of the networked elements on the subnetwork associated 

50 with that router interface, network interface card or port, and 
only if other responses have been received from other 
networked elements on other subnetworks associated with 
other router interfaces, network interface cards, and ports on 
the same router within a predetermined amount of time. 

55 12. A computer program product for Tise with network 
management server in a computer network as set forth in 
claim 8 wherein the computer readable code for performing 
fault tree analysis further comprises computer readable 
program code means embodied in said medium for deter- 

60 mining that a router is failed only if no responses have been 
received from any networked elements on any subnetworks 
associated with any of the router's interfaces, network 
interface cards, and ports within a predetermined amount of 
time. 

65 13. A computer program product for use with network 
management server in a computer network as set forth in 
claim 8, firer comprising: 
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a computer xisable medium having computer readable 
program code means embodied in said medium for 
immediately retransmitting all status query messages to 
all networked elements upon the expiration of the 
timer; and 5 

a computer usable medium having computer readable 
program code means embodied in said medium for 
re -initiating a timer for awaiting receipt of valid status 
responses from each networked element in reply to 
each retransmitted status query message, such that said lo 
fault tree analysis may be performed using a set of 
recently received responses from the networked ele- 
ments. 

14. A network management server system for producing 
failure alerts in a computer network, said computer network 15 
having at least one network router interconnected to several 
subnetworks, a plurality of networked elements intercon- 
nected via said subnetworks and to said network routers, and 

at least one problem management server for escalation of 
failure alerts and notification of failxu^es to maintenance 20 
personnel, said network management server system com- 
prising: 

a network server including a computer hardware platform 
with a processor and computer-readable mediimi for 
storing data and program code, a network communi- ^ 
cations protocol stack, a network management software 
suite, and at least one means for commimication to 
networked elements, router and problem management 
server via said computer network; 

a status monitor which monitors status replies from said 
networked elements made in response to status queries 
from said network management software suite; 

a failure analyzer invoked by said network management 
software suite upon the failure to receive one or more 35 
status replies from said networked elements, said fail- 
ure analyzer performing fault tree analysis to determine 
the most likely point of failure in the computer net- 
work; 

a problem management server notifier which transmits a 40 
network element failed message to the problem man- 
agement server via a computer network, said network 
element failed message including an indicator corre- 
sponding to said most likely point of failure as deter- 
mined by the failure analyzer; and 45 

a message forwarder which receives via a computer 
network one or more network element failed messages 
transmitted from said network management server; 
selects one network element failed message based upon 
residts of said fault tree analysis; and forwards said 50 
selected network element failed message to said prob- 
lem management server via a computer network 
thereby blocking the forwarding of all other network 
element failed messages received from the network 
managment server from being received by said problem 55 
management server. 

15. A network management server system for producing 
failure alerts in a computer network as set forth in claim 14, 
wherein said failure analyzer further comprises: 

a set of rules for determining the most likely point of 
failure based upon a predetermined topological inter- 
relationship between the networked elements, the 
subnetworks, and the routers and their interfaces to the 
subnetworks; and 

a comparator which applies the rules to a set of informa- 
tion containing all the status replies received from 
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networked elements within a predetermined time 
period, said comparator producing an output corre- 
sponding to a most likely point of failure of the 
network 

16. A network management server system for producing 
failure alerts in a computer network as set forth in claim 15, 
wherein said set of rules comprise a rule that declares a 
networked elenaent to be failed only if no status reply from 
the networked element is found in the set of information 
being analyzed by the analyzer, and only if at least one status 
reply from any other networked element on the same sub- 
network is foimd in the set of information being analyzed by 
the analyzer. 

17. A network management server system for producing 
failure alerts in a computer network as set forth in claim 15, 
wherein said set of rules comprise a rule that declares a 
STispect network router interface, network interface card, and 
port to be failed only if no status reply from any networked 
element on the subnetwork associated with the suspect 
network router interface, network interface card, and port is 
found in the set of information being analyzed by the 
analyzer, and only if at least one status reply from any other 
networked element on any other subnetwork associated with 
any other router interface, network interface card, and port 
on the same network router is found in the set of information 
being analyzed by the analyzer. 

18. A network management server system for producing 
failure alerts in a computer network as set forth in claim 15, 
wherein said set of rules comprise a rule that declares a 
suspect network router to be failed only if no status reply 
from any networked element any subnetwork associated any 
network interface card or port associated with the suspect 
network is found in the set of information being analyzed by 
the analyzer. 

19. A network management server system for producing 
failure alerts in a computer network as set forth in claim 14 
further comprising a status refresher which immediately 
transmits a status query message to each networked element 
upon the invocation of the failure analyzer in order to update 
the set of replies received and allow analysis on more recent 
status of the network to be performed. 

20. A network management server system for producing 
failure alerts in a computer network as set forth in claim 14 
wherein said status monitor, fault analyzer and problem 
management server notifier are appHcation programs inter- 
faced to a standard network management server software 
suite. 

21. A network management server system for producing 
failure alerts in a computer network as set forth in claim 20 
wherein said appHcation programs are "C programs com- 
piled and targeted for execution by said computer hardware 
platform. 

22. A network management server system for producing 
failure alerts in a computer network as set forth in claim 20 
wherein said standard network management server software 
suite is a Net\^ew suite. 

23. A network management server system for producing 
failure alerts in a computer network as set forth in claim 20 
wherein said standard network management server software 
suite is an OpenView suite. 

24. A network management server system for producing 
failure alerts in a computer network as set forth in claim 20 
wherein said computer hardware platform is an RS/6000 
computer platform running an AIX operating system, both 
of which are International Business Machines products. 



